NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / std / c++ / 541 < prev next >

Wrap

Text File | 1996-08-06 | 22.9 KB | 582 lines

Path: engnews1.Eng.Sun.COM!taumet!clamage From: "Constantine Antonovich:" <const@Orbotech.Co.IL> Newsgroups: comp.std.c++ Subject: operators new[]/delete[] Date: 26 Feb 1996 16:07:51 GMT Organization: Orbotech ltd. Approved: clamage@eng.sun.com (comp.std.c++) Message-ID: <313166CF.11C2@orbotech.co.il> NNTP-Posting-Host: taumet.eng.sun.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nntp-Posting-Host: orange.orbotech.co.il X-Mailer: Mozilla 2.0 (X11; I; SunOS 5.5 sun4c) Originator: clamage@taumet Several days ago I sent to this newsgroup the following code (with a question is the code incorrect according to current ANSI standard or do I have a bug in my compiler): //---------------------------------------------------------- #include <iostream.h> #include <assert.h> #include <new.h> class A { public: A(void) { cout << "A constructed" << endl; } ~A() { cout << "A destructed" << endl; } }; class B { public: B(void) { cout << "B constructed" << endl; } ~B() { cout << "B destructed" << endl; } }; A* foo_allocate(unsigned size) { assert(sizeof(A)==sizeof(B)); B* bptr=new B[size]; A* aptr=(A*)bptr; // A* aptr=reinterpret_cast<A*>bptr; is more correct // but my compilers do not support that yet. for (unsigned j=size; j>0;) // this place corrected according (bptr+(--j))->~B(); // to remark of a person whose name // I have lost to my regret. for (unsigned i=0; i<size; ++i) new(aptr+i) A; return aptr; } int main(void) { A* arr=foo_allocate(2); delete [] arr; # here return 0; } //---------------------------------------------------------- I considered as a problematic point the fact that one of my compilers treats line marked "#here" as destruction of array of B objects (B class destructors called). I have received several answers (I send this code also personally to Steve Clamage and he kindly answered me) concluding that according to the standards, the program contains operations with undefined result and so cannot be considered as a correct one. Meanwhile after some time of reflections I am going to insist on the following: -- ANSI work papers (April, 1995) leave interpretation of the correctness of the above-mentioned code ambiguous; -- If so, it's necessary to define this matter more precisely to eliminate differences of the interpretation by different compiler vendors; -- The code can be interpreted as representing an absolutely correct and well-defined behavior. In the following text, I am going to prove this point of view. 1. Little history of the code sample. The above-mentioned code sample can be considered as a play of imagination without any applicable weight. Meanwhile this code was born from another one making little more sense. Some times ago, I noticed periodically appearing discussions about necessity of renew operation in C++. Generally, I never felt myself out of my share because of renew nonexistence but seeing arguments of its usefulness again and again, I started to think: what the hell is its problem, if it's not possible to implement it by means of the language itself? So I wrote the following example, trying to avoid redundant operations usually existing in reallocation on classic manner: //---------------------------------------------------------- template<class T> class Allocator { private: struct filler { char filler_[sizeof(T)]; }; public: static T* allocate_array(unsigned elm); static void dup_array(T* dst, T* src, unsigned elm); static void fill_array(T* dst, unsigned elm); }; template<class T> T* Allocator<T>::allocate_array(unsigned elm) { return (T*)new filler[elm]; } template<class T> void Allocator<T>::dup_array(T* dst, T*src, unsigned elm) { for (unsigned i=0; i<elm; ++i) new(dst+i) T(src[i]); } template<class T> void Allocator<T>::fill_array(T* dst, unsigned elm) { for (unsigned i=0; i<elm; ++i) new(dst+i) T; } template<class T> T* realloc(T*& array, unsigned old_size, unsigned new_size) { set_new_handler(0); T* tmp=Allocator<T>::allocate_array(new_size); if (tmp) { if (new_size>old_size) { Allocator<T>::dup_array(tmp,array,old_size); Allocator<T>::fill_array(rmp+old_size,new_size-old_size); } else Allocator<T>::dup_array(tmp,array,new_size); delete [] array; array=tmp; } return tmp; } //---------------------------------------------------------- The general idea was to allocate an array with no initialization of its objects and just to use copy constructor to copy old elements into the newly allocated array (avoiding redundant creation of objects, with their default constructor, when they are immediately replaced by the following assignment). There is no problems of such technique usage in container classes where all memory and object management are absolutely hided from the user, but the imaginary renew operation should be applicable to regular arrays like: //---------------------------------------------------------- A* ap=new A[4]; realloc(ap,4,8); delete [] ap; //---------------------------------------------------------- Obviously the code should be contained in Allocator<T>::allocate_array function produces the mentioned problem. 2. Undefined behavior. Of course, some constructions in a possible program may produce undefined behavior. Nevertheless, even undefined behavior should have some definition. Let's consider the following code: //---------------------------------------------------------- A* ap=new A; B* bp=reinterpret_cast<B*>(ap); delete bp; // #here //---------------------------------------------------------- Without any doubt, result of the code executed in "#here" line is undefined. But definitely I wouldn't like, as a result of uncertainty of the behavior, my compiler to send email complaint to some League "C++ compilers against stupidity of the programmers". Also I wouldn't like my compiler to recognize incorrectness of the code and silently to call A destructor (after all, bp points to A object, isn't it?) instead of B one. Here I mean that UNDEFINED BEHAVIOR HAS AN ERROR MEANING. UNDEFINED BEHAVIOR ALWAYS RESULTING IN CORRECT EXECUTION OF A CODE WITH UNDEFINED BEHAVIOR, IS FORBIDDEN. In the previous example, we are dealing with a code with undefined behavior. The code is obviously incorrect. Meanwhile, in case of an imaginary compiler which can recognize true type of an object, the compiler could call A destructor in line marked "#here" (because its behavior whould be undefined anyway). In such a case, the invalid code will be correctly executes in any case (ALWAYS) and so the behavior of the compiler cannot be considered as a proper one. C++, partly by itself, partly as heir of C, stands on the principles of not stinting of a programmer in correctness of his actions if they don't contradict syntactical correctness. So, in the example, in the line marked "#here", the imaginary compiler should honestly try to destroy B object and to free its memory. Applying of B destructor to A object most probably will cause "undefined behavior", but its harm will depend on particular A and B classes (and obviously such applying will be not harmless ALWAYS). 3. No kidding. C++ is not wizard language. Generally, its behavior is understandable, enough clear and well predictable. Creation of C++ objects consists of two parts: memory allocation and object construction by itself. Even if such separation into two independent parts is not obvious and is not proclaimed by ANSI draft straightly, that doesn't change anything because such separation results from the language definition anyway. C++ memory management is ALMOST ALWAYS typeless. Here "ALMOST ALWAYS" stands for all cases covered by standard-conforming language implementation except of denumerable number of cases where a programmer explicitly changes the language behavior by means of the language constructions (and, I should add, on his own responsibility). To illustrate the term, the following example can be considered: //---------------------------------------------------------- A* ap=new A; //---------------------------------------------------------- What does the code do? Obviously, it creates a new object of type A. Yes, but I should say it creates a new object of type A ALMOST ALWAYS, just because the definition of A class may be as follows: //---------------------------------------------------------- class A { // some stuff public: // some stuff void* operator new(unsigned) { exit(1); } // not for heap usage }; //---------------------------------------------------------- and in such case obviously the previous statement will not create any A object. C++ memory management is ALMOST ALWAYS typeless because default operators new and new[] has no knowledge about type of object they allocate memory for. From the other hand, these operators are the single C++ mechanism to manage the memory. Moreover, this and only this part of object creation can be absolutely overloaded by a programmer and that absolutely separates it to independent stage of object creation. An object construction has hidden features (like initialization of tables of virtual functions) and only partly (constructors and destructors) can be influenced by a programmer. Meanwhile, declaration in the ANSI standard "placement new" also had finished separation of object construction into independent part since an object can be created with no allocation of memory (at any place and by the programmer, not only on the stack) and can be legally destroyed with no freeing of the memory (by direct call to its destructor). If we recall that according to C++ principles there should be no difference between objects and their behavior regardless their placement we have to agree that allocation of memory for an object and construction of the object in the allocated memory represent two independent stages ALMOST ALWAYS. Taking all this into account, even definitions of operator new and delete can be reconsidered to eliminate number of duplicated definitions, for example: new T(<arg-list>); represents shorthand of sequence new(::operator new(sizeof T)) T(<arg-list>); or new(T::operator new(sizeof T)) T(<arg-list>); if T::operator new is defined. delete tp; // there tp is non-null pointer on object of type T represents shorthand of atomic sequence if (tp) { tp->~T(); ::operator delete(<cast-to-mostly-base>tp); } or if (tp) { tp->~T(); T::operator delete(<cast-to-mostly-base>tp); } if T::operator delete is defined. Actually, similar redefinition can be done for new[]/delete[] also. 4. Alignment and memory allocation. In the starting the article example, the following code //---------------------------------------------------------- assert(sizeof(A)==sizeof(B)); B* bptr=new B[size]; A* aptr=(A*)bptr; //---------------------------------------------------------- really seems dangerous. Fergus Henderson writes: "This assertion is not guaranteed to succeed. It would take an extremely perverse implementation for it to fail, however, so I think it would be very portable, even though it is not strictly guaranteed to work." This sentence seems to be reasonable but, in deal, this assertion guarantees the correctness ALMOST ALWAYS and under that circumstance this check is absolutely portable. ANSI draft says: 18.4.1.1 Single-object forms Effects: The allocation function called by a new-expression to allocate size bytes of storage suitably aligned to represent any object of that size. 18.4.1.2 Array forms Effects: The allocation function called by the array form of a new-expression to allocate size bytes of storage suitably aligned to represent any array object of that size or smaller.32) We see that ANSI draft says that allocated memory should be suitably aligned for any object and any array object with the single limitation of size. We can also recall C++ pointer arithmetic and what is sizeof of some particular object (which contains concept of alignment in the object itself). An implementation hasn't to be extremely perverse the assertion condition to fail. It can be very simple one where B class for example has its own operator new[] allocating memory in specific alignment suitable for B but not for any other class and A one particularly (and even that is impossible for compilers still not supporting overloading of operator new[]). But this situation is exactly "ALMOST ALWAYS" case. If I am taking responsibility to overload operator new[] for specific class, it's also my responsibility to take a care of usage of such operations like one I am doing with the condition of the equation of object sizes. The language should stand in "ALMOST ALWAYS" correctness (and de facto it does that). If a programmer is doing something that is ALMOST ALWAYS correct, the language should demonstrate behavior like that is ALWAYS correct. Since ALMOST ALWAYS correct action may became incorrect only as a result of a programmer activity, this is also responsibility of the programmer to take a care of usage of such actions. Fergus Henderson continues: //---------------------------------------------------------- B* bptr=new B[size]; A* aptr=(A*)bptr; //---------------------------------------------------------- "This cast has unspecified behavior. (See 5.2.9 [expr.cast.reinterpret]/8.). However, I would expect it to work on most implementations." This article of ANSI draft interprets the operation as unspecified in case of cast from T1 to T2 and back and if there is difference in alignment of T1 and T2. Obviously, that is not our case (at least because definition of allocation function returning suitable for any object alignment). 5. Rest in peace. The following peace of the code has been considered as clear by all experts: //---------------------------------------------------------- for (unsigned j=size; j>0;) (bptr+(--j))->~B(); for (unsigned i=0; i<size; ++i) new(aptr+i) A; //---------------------------------------------------------- 6. Undefined behavior (continue). All experts have considered deletion of the array allocated in so strange manner as mostly incorrect point with undefined behavior. Steve Clamage writes: "You do have an operation with undefined results, however. In effect you are doing this: A* p = (A*)new B[2]; delete [] p; The rule is that the type of the pointer passed to delete[] must match the type of the pointer returned by new[], which is not the case here. The compiler is not required to diagnose the error, and the language definition does not say what the result is." Definitely, I am not doing that. If the standard of the language enables to interpret my code in such manner then there is something wrong with the standard! But even if the behavior is proclaimed to be undefined, I would like to remind what I have said in 2-nd paragraph. If the uncertainty of the behavior was properly defined then the code wouldn't have undefined behavior! (It would be very interesting to test the code on some another compilers. I may suppose that the code has, in deal, enough defined behavior de facto as result of most logical implementation of operators new/delete and just SPARCompiler C++, for unknown reason, stores pointer to destructor function together with array size). Fergus Henderson agrees with Steve Clamage: "This has undefined behaviour. It contravenes 5.3.5 [expr.delete]/2, which says that the expression passed to `delete []' must be a pointer to the first element of an array of objects allocated with `new []'; this is not the case, because although there once was such an array at that memory location, its lifetime ended when the memory was overwritten by the calls to placement new (see 3.8[basic.life]/1)." There is at least one self-contradictory point in that conclusion. Of course, lifetime of all B objects had been ended, by why does that mean end of the array life? Or in contrary, if end of life-time of B objects means end of life-time of the array, so creation of A objects should mean creation of new array, shouldn't it?. Article 5.3.5.2 of ANSI draft says something slightly different: "...In the second alternative (delete array), the value of the operand of delete shall be a pointer to an array created by a new-expression without a new-placement specification. If not, the behavior is undefined." So delete takes as its argument POINTER TO ARRAY (even objects are not mentioned). No one says that ...pointer passed to delete[] must match the type of the pointer returned by new[]... ...the expression passed to `delete []' must be a pointer to the first element of an array of objects allocated with `new []'... All that already means INTERPRETATION of the standard and also that the standard enables such interpretations. Meanwhile C++ memory management seems not to need so strong restrictions just because the memory can be managed separately from objects construction/destruction and can be reused without reallocation. I agree that all above said regarding operators new/delete is point of view of common sense (one should use delete and delete[] with the same pointer to the same type he got from new and new[] and not play with the pointers in 99.9999% cases he uses new/delete at all and in 100% cases if he doesn't understand what he is doing) but that has nothing common with boundaries of proper language processing. And here we really arrive to the final point. ANSI draft gives no strong array definition to disable ambiguous array interpretation. And above-mentioned common-sense based array understanding has all rights to exist. 7. No kidding (continue). I suppose that this ambiguity in interpretation of arrays and operators new/delete should be eliminated from the standard. I would propose the following additions in supposition that they: -- do not conflict with any of previous standard's definitions; -- do not change nothing in the standard's common principles and common understanding of the standard except of very specific point with no influence upon the standard itself; -- do not influence mostly on existing implementations of the language since some implementations use this idea de facto and others can easily be corrected; -- do not influence mostly on existing C++ applications because they concern some very specific point in the standard with not common and extremely rarely (if any) usage. -- will make the standard more logically completed. Addition to array definition [dcl.array]: Array of N T object represents contiguous amount of memory of suitable size and alignment with N non-overlapping objects of type T placed into the memory with no gaps and each properly aligned. Addition to operator delete [expr.delete] ("above" here means all previously said by the standard): In either alternative, the type of the deleted object is evaluated as described above and according to the type of the actual operand. In my opinion, the additions may be considered as an overweight but the experience shows they are not. 8. Renew In my opinion, the previously mentioned additions would make starting the article example absolutely clear and ALMOST ALWAYS correct with no discussions (I continue to insist that the example is so even now but with discussions). But what about renew? C++ standard very hardly accepts new keywords and new features. But may be it makes sense, at least for completeness, to add to the standard's (enough) new family of various_cast<T> something like following: dynamic_sizeof(object); // should return sizeof evaluated in // run time, for example: //---------------------------------------------------------- class A { public: unsigned u; A(); virtual ~A(); }; class B : public A { public: int i; B(); ~B(); }; int main(void) { A* a=new B; cout << sizeof(A) << endl; // types 8 cout << sizeof(B) << endl; // types 12 cout << sizeof(*a) << endl; // types 8 cout << dynamic_sizeof(*a) << endl; // types 12 return 0; } //---------------------------------------------------------- array_sizeof(pointer); // should return sizeof of an array, for example //---------------------------------------------------------- class A { public: unsigned u; A(); }; int main(void) { A* a=new A; A ar[2]; A* ap=new A[4]; cout << array_sizeof(a)/sizeof(A) << endl; // types 0 cout << array_sizeof(ar)/sizeof(A) << endl; // types 2 cout << array_sizeof(ap)/sizeof(A) << endl; // types 4 return 0; } //---------------------------------------------------------- I may suppose that usage of arrays of objects with no destructors (where a compiler may optimize away storage of number of elements) became enough rare in contemporary C++ (and in lot of cases, number of elements in such arrays is known already at compilation time) so the language may enough easily provide a programmer with such information as number of elements in an array. -- //------------------------------------------------------------------ // Opinions expressed here are my own only // Constantine Antonovich const@orbotech.co.il //------------------------------------------------------------------ [ To submit articles: Try just posting with your newsreader. If that fails, use mailto:std-c++@ncar.ucar.edu FAQ: http://reality.sgi.com/employees/austern_mti/std-c++/faq.html Policy: http://reality.sgi.com/employees/austern_mti/std-c++/policy.html Comments? mailto:std-c++-request@ncar.ucar.edu ]